83 research outputs found

    Constructing ensembles for intrinsically disordered proteins

    Get PDF
    The relatively flat energy landscapes associated with intrinsically disordered proteins makes modeling these systems especially problematic. A comprehensive model for these proteins requires one to build an ensemble consisting of a finite collection of structures, and their corresponding relative stabilities, which adequately capture the range of accessible states of the protein. In this regard, methods that use computational techniques to interpret experimental data in terms of such ensembles are an essential part of the modeling process. In this review, we critically assess the advantages and limitations of current techniques and discuss new methods for the validation of these ensembles

    Deep Metric Learning for the Hemodynamics Inference with Electrocardiogram Signals

    Full text link
    Heart failure is a debilitating condition that affects millions of people worldwide and has a significant impact on their quality of life and mortality rates. An objective assessment of cardiac pressures remains an important method for the diagnosis and treatment prognostication for patients with heart failure. Although cardiac catheterization is the gold standard for estimating central hemodynamic pressures, it is an invasive procedure that carries inherent risks, making it a potentially dangerous procedure for some patients. Approaches that leverage non-invasive signals - such as electrocardiogram (ECG) - have the promise to make the routine estimation of cardiac pressures feasible in both inpatient and outpatient settings. Prior models trained to estimate intracardiac pressures (e.g., mean pulmonary capillary wedge pressure (mPCWP)) in a supervised fashion have shown good discriminatory ability but have been limited to the labeled dataset from the heart failure cohort. To address this issue and build a robust representation, we apply deep metric learning (DML) and propose a novel self-supervised DML with distance-based mining that improves the performance of a model with limited labels. We use a dataset that contains over 5.4 million ECGs without concomitant central pressure labels to pre-train a self-supervised DML model which showed improved classification of elevated mPCWP compared to self-supervised contrastive baselines. Additionally, the supervised DML model that uses ECGs with access to 8,172 mPCWP labels demonstrated significantly better performance on the mPCWP regression task compared to the supervised baseline. Moreover, our data suggest that DML yields models that are performant across patient subgroups, even when some patient subgroups are under-represented in the dataset. Our code is available at https://github.com/mandiehyewon/ssldm

    A Structure-free Method for Quantifying Conformational Flexibility in proteins

    Get PDF
    All proteins sample a range of conformations at physiologic temperatures and this inherent flexibility enables them to carry out their prescribed functions. A comprehensive understanding of protein function therefore entails a characterization of protein flexibility. Here we describe a novel approach for quantifying a protein’s flexibility in solution using small-angle X-ray scattering (SAXS) data. The method calculates an effective entropy that quantifies the diversity of radii of gyration that a protein can adopt in solution and does not require the explicit generation of structural ensembles to garner insights into protein flexibility. Application of this structure-free approach to over 200 experimental datasets demonstrates that the methodology can quantify a protein’s disorder as well as the effects of ligand binding on protein flexibility. Such quantitative descriptions of protein flexibility form the basis of a rigorous taxonomy for the description and classification of protein structure.Massachusetts Institute of Technology (Steve G. and Renee Finn Faculty Innovation Fellowship)Swiss National Science Foundation (Early Postdoc.Mobility Fellowship

    Comparative Studies of Disordered Proteins with Similar Sequences: Application to Aβ40 and Aβ42

    Get PDF
    Quantitative comparisons of intrinsically disordered proteins (IDPs) with similar sequences, such as mutant forms of the same protein, may provide insights into IDP aggregation—a process that plays a role in several neurodegenerative disorders. Here we describe an approach for modeling IDPs with similar sequences that simplifies the comparison of the ensembles by utilizing a single library of structures. The relative population weights of the structures are estimated using a Bayesian formalism, which provides measures of uncertainty in the resulting ensembles. We applied this approach to the comparison of ensembles for Aβ40 and Aβ42. Bayesian hypothesis testing finds that although both Aβ species sample β-rich conformations in solution that may represent prefibrillar intermediates, the probability that Aβ42 samples these prefibrillar states is roughly an order of magnitude larger than the frequency in which Aβ40 samples such structures. Moreover, the structure of the soluble prefibrillar state in our ensembles is similar to the experimentally determined structure of Aβ that has been implicated as an intermediate in the aggregation pathway. Overall, our approach for comparative studies of IDPs with similar sequences provides a platform for future studies on the effect of mutations on the structure and function of disordered proteins

    Intrinsically Disordered Proteins: Where Computation Meets Experiment

    Get PDF
    Proteins are heteropolymers that play important roles in virtually every biological reaction. While many proteins have well-defined three-dimensional structures that are inextricably coupled to their function, intrinsically disordered proteins (IDPs) do not have a well-defined structure, and it is this lack of structure that facilitates their function. As many IDPs are involved in essential cellular processes, various diseases have been linked to their malfunction, thereby making them important drug targets. In this review we discuss methods for studying IDPs and provide examples of how computational methods can improve our understanding of IDPs. We focus on two intensely studied IDPs that have been implicated in very different pathologic pathways. The first, p53, has been linked to over 50% of human cancers, and the second, Amyloid-β (Aβ), forms neurotoxic aggregates in the brains of patients with Alzheimer’s disease. We use these representative proteins to illustrate some of the challenges associated with studying IDPs and demonstrate how computational tools can be fruitfully applied to arrive at a more comprehensive understanding of these fascinating heteropolymers.National Science Foundation (U.S.). Directorate for Biological Sciences. Postdoctoral Research Fellowship (Grant 1309247

    Sequential Multi-Dimensional Self-Supervised Learning for Clinical Time Series

    Full text link
    Self-supervised learning (SSL) for clinical time series data has received significant attention in recent literature, since these data are highly rich and provide important information about a patient's physiological state. However, most existing SSL methods for clinical time series are limited in that they are designed for unimodal time series, such as a sequence of structured features (e.g., lab values and vitals signs) or an individual high-dimensional physiological signal (e.g., an electrocardiogram). These existing methods cannot be readily extended to model time series that exhibit multimodality, with structured features and high-dimensional data being recorded at each timestep in the sequence. In this work, we address this gap and propose a new SSL method -- Sequential Multi-Dimensional SSL -- where a SSL loss is applied both at the level of the entire sequence and at the level of the individual high-dimensional data points in the sequence in order to better capture information at both scales. Our strategy is agnostic to the specific form of loss function used at each level -- it can be contrastive, as in SimCLR, or non-contrastive, as in VICReg. We evaluate our method on two real-world clinical datasets, where the time series contains sequences of (1) high-frequency electrocardiograms and (2) structured data from lab values and vitals signs. Our experimental results indicate that pre-training with our method and then fine-tuning on downstream tasks improves performance over baselines on both datasets, and in several settings, can lead to improvements across different self-supervised loss functions.Comment: ICML 202

    Motif Discovery in Physiological Datasets: A Methodology for Inferring Predictive Elements

    Get PDF
    In this article, we propose a methodology for identifying predictive physiological patterns in the absence of prior knowledge. We use the principle of conservation to identify activity that consistently precedes an outcome in patients, and describe a two-stage process that allows us to efficiently search for such patterns in large datasets. This involves first transforming continuous physiological signals from patients into symbolic sequences, and then searching for patterns in these reduced representations that are strongly associated with an outcome. Our strategy of identifying conserved activity that is unlikely to have occurred purely by chance in symbolic data is analogous to the discovery of regulatory motifs in genomic datasets. We build upon existing work in this area, generalizing the notion of a regulatory motif and enhancing current techniques to operate robustly on non-genomic data. We also address two significant considerations associated with motif discovery in general: computational efficiency and robustness in the presence of degeneracy and noise. To deal with these issues, we introduce the concept of active regions and new subset-based techniques such as a two-layer Gibbs sampling algorithm. These extensions allow for a framework for information inference, where precursors are identified as approximately conserved activity of arbitrary complexity preceding multiple occurrences of an event. We evaluated our solution on a population of patients who experienced sudden cardiac death and attempted to discover electrocardiographic activity that may be associated with the endpoint of death. To assess the predictive patterns discovered, we compared likelihood scores for motifs in the sudden death population against control populations of normal individuals and those with non-fatal supraventricular arrhythmias. Our results suggest that predictive motif discovery may be able to identify clinically relevant information even in the absence of significant prior knowledge.CIMIT: Center for Integration of Medicine and Innovative TechnologyHarvard University--MIT Division of Health Sciences and Technolog

    Hidden States within Disordered Regions of the CcdA Antitoxin Protein

    Get PDF
    The bacterial toxin–antitoxin system CcdB–CcdA provides a mechanism for the control of cell death and quiescence. The antitoxin protein CcdA is a homodimer composed of two monomers that each contain a folded N-terminal region and an intrinsically disordered C-terminal arm. Binding of the intrinsically disordered C-terminal arm of CcdA to the toxin CcdB prevents CcdB from inhibiting DNA gyrase and thereby averts cell death. Accurate models of the unfolded state of the partially disordered CcdA antitoxin can therefore provide insight into general mechanisms whereby protein disorder regulates events that are crucial to cell survival. Previous structural studies were able to model only two of three distinct structural states, a closed state and an open state, that are adopted by the C-terminal arm of CcdA. Using a combination of free energy simulations, single-pair Förster resonance energy transfer experiments, and existing NMR data, we developed structural models for all three states of the protein. Contrary to prior studies, we find that CcdA samples a previously unknown state where only one of the disordered C-terminal arms makes extensive contacts with the folded N-terminal domain. Moreover, our data suggest that previously unobserved conformational states play a role in regulating antitoxin concentrations and the activity of CcdA’s cognate toxin. These data demonstrate that intrinsic disorder in CcdA provides a mechanism for regulating cell fate
    corecore